Positive-Unlabeled (PU) learning aims to learn a model with rare positive samples and abundant unlabeled samples. Compared with classical binary classification, the task of PU learning is much more challenging due to the existence of many incompletely-annotated data instances. Since only part of the most confident positive samples are available and evidence is not enough to categorize the rest samples, many of these unlabeled data may also be the positive samples. Research on this topic is particularly useful and essential to many real-world tasks which demand very expensive labelling cost. For example, the recognition tasks in disease diagnosis, recommendation system and satellite image recognition may only have few positive samples that can be annotated by the experts. These methods mainly omit the intrinsic hardness of some unlabeled data, which can result in sub-optimal performance as a consequence of fitting the easy noisy data and not sufficiently utilizing the hard data. In this paper, we focus on improving the commonly-used nnPU with a novel training pipeline. We highlight the intrinsic difference of hardness of samples in the dataset and the proper learning strategies for easy and hard data. By considering this fact, we propose first splitting the unlabeled dataset with an early-stop strategy. The samples that have inconsistent predictions between the temporary and base model are considered as hard samples. Then the model utilizes a noise-tolerant Jensen-Shannon divergence loss for easy data; and a dual-source consistency regularization for hard data which includes a cross-consistency between student and base model for low-level features and self-consistency for high-level features and predictions, respectively.
translated by 谷歌翻译
The task of Few-shot learning (FSL) aims to transfer the knowledge learned from base categories with sufficient labelled data to novel categories with scarce known information. It is currently an important research question and has great practical values in the real-world applications. Despite extensive previous efforts are made on few-shot learning tasks, we emphasize that most existing methods did not take into account the distributional shift caused by sample selection bias in the FSL scenario. Such a selection bias can induce spurious correlation between the semantic causal features, that are causally and semantically related to the class label, and the other non-causal features. Critically, the former ones should be invariant across changes in distributions, highly related to the classes of interest, and thus well generalizable to novel classes, while the latter ones are not stable to changes in the distribution. To resolve this problem, we propose a novel data augmentation strategy dubbed as PatchMix that can break this spurious dependency by replacing the patch-level information and supervision of the query images with random gallery images from different classes from the query ones. We theoretically show that such an augmentation mechanism, different from existing ones, is able to identify the causal features. To further make these features to be discriminative enough for classification, we propose Correlation-guided Reconstruction (CGR) and Hardness-Aware module for instance discrimination and easier discrimination between similar classes. Moreover, such a framework can be adapted to the unsupervised FSL scenario.
translated by 谷歌翻译
在过去的几年中,基于卷积的神经网络(CNN)的人群计数方法已取得了有希望的结果。但是,对于准确的计数估计,量表变化问题仍然是一个巨大的挑战。在本文中,我们提出了一个多尺度特征聚合网络(MSFANET),可以在某种程度上减轻此问题。具体而言,我们的方法由两个特征聚合模块组成:短聚合(Shortagg)和Skip Contregation(Skipagg)。 Shortagg模块聚集了相邻卷积块的特征。其目的是制作具有从网络底部逐渐融合的不同接收场的功能。 Skipagg模块将具有小型接受场的特征直接传播到具有更大接收场的特征。它的目的是促进特征与大小接收场的融合。尤其是,Skipagg模块引入了Swin Transformer块中的本地自我注意力特征,以结合丰富的空间信息。此外,我们通过考虑不均匀的人群分布来提出基于局部和全球的计数损失。在四个具有挑战性的数据集(Shanghaitech数据集,UCF_CC_50数据集,UCF-QNRF数据集,WorldExpo'10数据集)上进行了广泛的实验,这表明与先前的先前的尚未实行的方法相比,提出的易于实现的MSFANET可以实现有希望的结果。
translated by 谷歌翻译
车道检测是许多实际自治系统的重要组成部分。尽管已经提出了各种各样的车道检测方法,但随着时间的推移报告了基准的稳定改善,但车道检测仍然是一个未解决的问题。这是因为大多数现有的车道检测方法要么将车道检测视为密集的预测或检测任务,因此很少有人考虑泳道标记的独特拓扑(Y形,叉形,几乎是水平的车道),该拓扑标记物是该标记的。导致亚最佳溶液。在本文中,我们提出了一种基于继电器链预测的新方法检测。具体而言,我们的模型预测了分割图以对前景和背景区域进行分类。对于前景区域中的每个像素点,我们穿过前向分支和后向分支以恢复整个车道。每个分支都会解码传输图和距离图,以产生移动到下一个点的方向,以及逐步预测继电器站的步骤(下一个点)。因此,我们的模型能够沿车道捕获关键点。尽管它很简单,但我们的策略使我们能够在包括Tusimple,Culane,Curvelanes和Llamas在内的四个主要基准上建立新的最先进。
translated by 谷歌翻译
从高度不足的数据中恢复颜色图像和视频是面部识别和计算机视觉中的一项基本且具有挑战性的任务。通过颜色图像和视频的多维性质,在本文中,我们提出了一种新颖的张量完成方法,该方法能够有效探索离散余弦变换(DCT)下张量数据的稀疏性。具体而言,我们介绍了两个``稀疏 +低升级''张量完成模型,以及两种可实现的算法来找到其解决方案。第一个是基于DCT的稀疏加权核标准诱导低级最小化模型。第二个是基于DCT的稀疏加上$ P $换图映射引起的低秩优化模型。此外,我们因此提出了两种可实施的增强拉格朗日算法,以解决基础优化模型。一系列数值实验在内,包括颜色图像介入和视频数据恢复表明,我们所提出的方法的性能要比许多现有的最新张量完成方法更好,尤其是对于缺少数据比率较高的情况。
translated by 谷歌翻译
Despite significant progress in object categorization, in recent years, a number of important challenges remain; mainly, the ability to learn from limited labeled data and to recognize object classes within large, potentially open, set of labels. Zero-shot learning is one way of addressing these challenges, but it has only been shown to work with limited sized class vocabularies and typically requires separation between supervised and unsupervised classes, allowing former to inform the latter but not vice versa. We propose the notion of vocabulary-informed learning to alleviate the above mentioned challenges and address problems of supervised, zero-shot, generalized zero-shot and open set recognition using a unified framework. Specifically, we propose a weighted maximum margin framework for semantic manifold-based recognition that incorporates distance constraints from (both supervised and unsupervised) vocabulary atoms. Distance constraints ensure that labeled samples are projected closer to their correct prototypes, in the embedding space, than to others. We illustrate that resulting model shows improvements in supervised, zero-shot, generalized zero-shot, and large open set recognition, with up to 310K class vocabulary on Animal with Attributes and ImageNet datasets.
translated by 谷歌翻译
A noisy training set usually leads to the degradation of the generalization and robustness of neural networks. In this paper, we propose a novel theoretically guaranteed clean sample selection framework for learning with noisy labels. Specifically, we first present a Scalable Penalized Regression (SPR) method, to model the linear relation between network features and one-hot labels. In SPR, the clean data are identified by the zero mean-shift parameters solved in the regression model. We theoretically show that SPR can recover clean data under some conditions. Under general scenarios, the conditions may be no longer satisfied; and some noisy data are falsely selected as clean data. To solve this problem, we propose a data-adaptive method for Scalable Penalized Regression with Knockoff filters (Knockoffs-SPR), which is provable to control the False-Selection-Rate (FSR) in the selected clean data. To improve the efficiency, we further present a split algorithm that divides the whole training set into small pieces that can be solved in parallel to make the framework scalable to large datasets. While Knockoffs-SPR can be regarded as a sample selection module for a standard supervised training pipeline, we further combine it with a semi-supervised algorithm to exploit the support of noisy data as unlabeled data. Experimental results on several benchmark datasets and real-world noisy datasets show the effectiveness of our framework and validate the theoretical results of Knockoffs-SPR. Our code and pre-trained models will be released.
translated by 谷歌翻译
This paper introduces a new few-shot learning pipeline that casts relevance ranking for image retrieval as binary ranking relation classification. In comparison to image classification, ranking relation classification is sample efficient and domain agnostic. Besides, it provides a new perspective on few-shot learning and is complementary to state-of-the-art methods. The core component of our deep neural network is a simple MLP, which takes as input an image triplet encoded as the difference between two vector-Kronecker products, and outputs a binary relevance ranking order. The proposed RankMLP can be built on top of any state-of-the-art feature extractors, and our entire deep neural network is called the ranking deep neural network, or RankDNN. Meanwhile, RankDNN can be flexibly fused with other post-processing methods. During the meta test, RankDNN ranks support images according to their similarity with the query samples, and each query sample is assigned the class label of its nearest neighbor. Experiments demonstrate that RankDNN can effectively improve the performance of its baselines based on a variety of backbones and it outperforms previous state-of-the-art algorithms on multiple few-shot learning benchmarks, including miniImageNet, tieredImageNet, Caltech-UCSD Birds, and CIFAR-FS. Furthermore, experiments on the cross-domain challenge demonstrate the superior transferability of RankDNN.The code is available at: https://github.com/guoqianyu-alberta/RankDNN.
translated by 谷歌翻译
在大型数据集上,对视力任务的深度学习模型进行了培训,因为存在一个通用表示,可用于对所有样本进行预测。尽管事实证明,高复杂性模型能够学习此类表示,但对数据的特定子集进行了培训的专家,可以更有效地推断出标签。然而,使用专家的混合物会提出两个新问题,即(i)在提出新的看不见的样本时分配正确的专家。 (ii)找到培训数据的最佳分区,以使专家最依赖于共同特征。在动态路由(DR)中,提出了一个新颖的体系结构,其中每层由一组专家组成,但是在没有解决这两个挑战的情况下,我们证明该模型可以恢复使用相同的专家子集。在我们的方法中,对多元化的动态路由(DIVDR)进行了明确培训,以解决找到数据相关分区并以无监督的方法分配正确的专家的挑战。我们对MS-Coco的城市景观和对象检测以及实例分割进行了几项实验,显示了几个基线的性能的改善。
translated by 谷歌翻译
4D隐式表示中的最新进展集中在全球控制形状和运动的情况下,低维潜在向量,这很容易缺少表面细节和累积跟踪误差。尽管许多深层的本地表示显示了3D形状建模的有希望的结果,但它们的4D对应物尚不存在。在本文中,我们通过提出一个新颖的局部4D隐性代表来填补这一空白,以动态穿衣人,名为Lord,具有4D人类建模和局部代表的优点,并实现具有详细的表面变形的高保真重建,例如衣服皱纹。特别是,我们的主要见解是鼓励网络学习本地零件级表示的潜在代码,能够解释本地几何形状和时间变形。为了在测试时间进行推断,我们首先估计内部骨架运动在每个时间步中跟踪本地零件,然后根据不同类型的观察到的数据通过自动编码来优化每个部分的潜在代码。广泛的实验表明,该提出的方法具有强大的代表4D人类的能力,并且在实际应用上胜过最先进的方法,包括从稀疏点,非刚性深度融合(质量和定量)进行的4D重建。
translated by 谷歌翻译